home *** CD-ROM | disk | FTP | other *** search
-
-
-
- CookieTool V2.3
- ===============
-
- A team of programs to help you maintain your cookie database:
-
- "CookieTool" itself eliminates duplicate entries,
- sorts cookies alphabetically or by size if you wish.
- "CdbSplit" extracts parts of the database to a separate file,
- by keyword, by size, by number, or as groups of 'similar' cookies.
-
-
-
- 0. Who needs it?
- ----------------
-
- These tools are intended for users of "Cookie", "IntuiCookie" (both
- available on Aminet, util/misc/), or generally for any plain text cookie
- database with entries separated by "%%" lines. They are nice for
- crunching your cookie collection by a few KByte of duplicate stuff, but
- also for splitting it into seperate files, for example extracting
- quotations or dumping those cookies too big to be fun to read.
-
- Note that "CookieTool" and "CdbSplit" know how to handle the database
- itself, but not how to update the corresponding index file (also called
- 'hash file'). That means you still need "cookhash" (which should be
- included with your cookie display program).
-
-
- 1. CookieTool command summary
- -----------------------------
-
- cookietool [options] <cookiefile> [logfile]
-
- The crunched cookie database will be WRITTEN BACK to the input file (quite
- different from cookietool V1.x behaviour). The deleted cookies will be
- written to <logfile>, if one is specified. (Thus one could restore the
- original database by appending the logfile to the cookiefile again.)
-
- options: meaning:
- -c case-sensitive comparisons (for both deleting and sorting)
- -d[0-3] how fussy about word delimiters?
- -d3: strict, compare character by character
- -d2: ignore number and kind of spaces between words (DEFAULT)
- -d1: treat punctuation signs as spaces, too
- -d0: completely ignore punctuation signs and spaces
- -a delete cookies that are "abbreviations" of another, too
- -p passive, don't delete anything
- -s sort output
- -sl " , looking at the last line only \ intended to
- -sw " , looking at the last word only }- sort quotations
- -s<sep> " , starting at the last occurence / by source
- of <sep>, e.g. '-s--' or '-s...'
- -ss sort by size
- -o overwrite the input file directly (no tempfile), risky!
- Use this *only* if your disk is so full that cookietool
- couldn't create its tempfile.
-
-
- 2. CdbSplit command summary
- ---------------------------
-
- cdbsplit [options] <cookiefile> <hitfile>
-
- By default, the input file will always be OVERWRITTEN by a reduced version
- of the database, so that cookies are moved (not copied) to the hit file.
- An existing hit file will never be overwritten, but may be appended to.
-
- options: meaning:
- -c case-sensitive comparisons (for both keywords and groups)
- -d[0-3] how fussy about word delimiters? (see above for details)
- -k<keywd> optional keyword \_combine these to form simple
- -K<keywd> mandatory keyword / boolean expressions
- -l<l_min> accept only cookies with <l_min> lines or more
- -L<l_max> " " " " <l_max> lines or less
- -w<w_min> accept only cookies <w_min> chars wide or more
- -W<w_max> " " " <w_max> chars wide or less
- -n<n_min> start at cookie number <n_min>
- -N<n_max> stop after " " <n_max>
- -m<n> find groups of cookies starting with <n> matching characters
- (database must have been sorted for this to make sense!)
- -x extract only, don't modify <cookiefile>
- -a append, if <hitfile> exists (instead of failing)
-
-
- 3. Examples
- -----------
-
- These examples assume that your cookie database is in a single file
- called "cookies" (tacky name, hah :). Oh, and I'd suggest that you make
- a backup of your cookies somewhere before trying "cookietool" on them.
-
-
- 3.1. Do what "onecookie" used to do
- -----------------------------------
-
- The classic "onecookie" could only delete verbatim copies of a cookie,
- where even two spaces instead of one would make a difference. CookieTool
- can be told to behave like this, too:
-
- cookietool cookies -c -d3
-
- The default settings are a bit more generous:
-
- cookietool cookies
-
- might delete a few cookies more. Upper- and lowercase letters are now
- considered the same, and it doesn't matter if two words are seperated by
- one or several spaces, by a tab sign, by a line break, etc. So two
- copies of the same text, but formatted in different ways, will still be
- recognized as identical.
-
- The question is: do you really want such copies deleted automatically, or
- would you rather decide yourself which one of such *almost* identical
- cookies should be deleted? This question arises even more with the real
- liberal settings like
-
- cookietool cookies -d0
-
- which for example recognizes "Kill ugly radio. -- Frank Zappa" and
- "Kill ugly radio... Frank Zappa" as identical. (Both of these two styles
- of supplying sources to quotations are frequently used.) More on that
- question later.
-
-
- 3.2. Deleting abbreviations
- ---------------------------
-
- It occurs rather frequently that one cookie seems to be an "abbreviation"
- of another. Sayings may consist of more than one sentence, but the first
- sentence is sometimes quoted by itself. And quotations are sometimes
- written down with, sometimes without their author. In both cases the
- shorter cookie may be deleted, and cookietool can do that, too (-a).
-
- However, one should not ignore puctuation signs with this option (don't
- use -d1 or -d0), because that would consider "A penny saved is a penny."
- as an abbreviation of "A penny saved is a penny earned.", which is not
- desireable. It might be a good idea to create a log file of the deleted
- cookies and look at least at the shortest ones among them:
-
- cookietool -a cookies log ; extract to "log" rather than just deleting
- cookietool log -ss ; sort the extracted cookies by size
- Ed log ; check if some are worth keeping and delete the rest
- cdbsplit log cookies -a ; put the survivors back
-
- Using 'cdbsplit -a' without any search options is a nice way of moving
- cookies back into your main database. (Note that "Type log >>cookies",
- "Delete log" would essentially do the same, but is risky: If you
- accidentally type '>' instead of '>>', that would overwrite your main
- database instead of appending to it! Such a thing can't happen with
- cdbsplit -a.)
-
-
- 3.3. Move cookies to and fro between files
- ------------------------------------------
-
- Let's say you want to keep cookies which are quotations in a seperate
- file. That's easy, they should be recognized by the "--" which precedes
- the source of the saying:
-
- cdbsplit cookies quotes -k--
-
- Another example: You might want to move all Bart Simpson quotes to a
- separate "simpsons" file. That's a little trickier, as "Bart" is a
- rather short keyword, which might appear as part of other words as well.
- Try three passes, cautious at first, then more generous to make sure you
- get them all:
-
- cdbsplit cookies simpsons -KBart -KSimpson
-
- This collects cookies with both "Bart" and "Simpson" in them (note the
- capital -K!). I can't imagine anything going wrong here.
-
- cdbsplit cookies simps2 "-kBart " -d1 -c
-
- Note how the -d1 in this second command will make "Bart!" but not
- "Barton" be identified as "Bart ". But as this keyword fails if "Bart"
- appears at the very end of a cookie, you still have to collect the rest:
-
- cdbsplit cookies simps3 -kBart
-
- Now look at the "simps2" and "simps3" files and check if anything went
- wrong with them. In my case, I found a quotation by a guy named "Barth".
- It's easy to put it back:
-
- cdbsplit simps3 cookies -kBarth -a
-
- Finally, put the three hit files together:
-
- cdbsplit simps2 simpsons -a
- cdbsplit simps3 simpsons -a
-
-
- 3.4. Support for editing manually
- ---------------------------------
-
- CdbSplit can help you collect all cookies that need reformatting (because
- they are too wide) in an extra file, and put them back later:
-
- cdbsplit -w76 cookies wide
- Ed wide ; add some line breaks
- cdbsplit wide cookies -a
-
- Now this was easy. But cdbsplit can even help you to find groups of
- "similar" cookies! That's helpful to eliminate cookies that differ only
- by some typing error (e.g. 'seperate'/'separate'), something that
- cookietool will *never* handle automatically. To do this, you must sort
- your database first, then tell cdbsplit how many agreeing characters make
- "similar" cookies (I think 10 - 20 characters is usually a good choice):
-
- cookietool cookies -s -d0 -p
- cdbsplit cookies temp -d0 -m20
- Ed temp ; delete some manually
- cdbsplit temp cookies -a
-
- When editing the "temp" file, you should find groups of two or more
- cookies with identical beginnings. If you think they are really the same,
- you can delete all but one (!) of each group. This is a tedious work,
- but hell, it's far easier than just sorting the database and looking
- for similar cookies with your eyes only! :)
-
- Here's a more sophisticated procedure that will extract groups of cookies
- starting and ending with the same word (well, almost):
-
- cookietool cookies -s -d1 -p ; regular sorting first
- cookietool cookies -sw -d1 -p ; *then* sort by last word
- cdbsplit cookies temp -d1 -m3 ; yes, 3 matching characters will do!
- Ed temp ; delete all but one from each group
- cdbsplit temp cookies -a ; put the others back
-
- Applying -s-- instead of -sw in the second pass could help you find
- similar sayings that are attributed to the same person.
-
-
- 3.5. Joining "good" and "bad" cookie files
- ------------------------------------------
-
- Suppose you have a well maintained cookie database, without double
- entries, all the cookies are formatted the way you want them, and all the
- authors of quotations are written down in your preferred style. Now you
- find an archive with new cookies somewhere and you want to add them to
- your database, but you have reason to believe that this will introduce a
- lot of double entries. Here's how I would proceed.
-
- In the following, assume that your good cookies are in a file called
- "cookies", the new cookies are in a file called "visitors".
-
- First make sure there are no double entries left in your main file, at
- least none that cookietool can find:
-
- cookietool cookies -d0 log
-
- And look at the number of cookies that cookietool reported, suppose it's
- 4711, you'll need it later. (B.t.w., normally this pass shouldn't delete
- anything, if your database is really in such good shape. And don't worry
- if it did, those cookies are in the "log" file now. but if you want to
- put them back, please do that only after this procedure is complete!)
-
- Now append the "visitors" file, then delete all doubles from the new and
- larger "cookies" file:
-
- cdbsplit visitors cookies -a
- cookietool cookies -d0
-
- This will delete only new cookies (if any), because cookietool starts
- deleting from the end of the file. Of course, for this to work, it is
- essential that you assemble the files in this order (i.e. don't append
- "cookies" to "visitors")!
-
- Finally you might want to move the new cookies to their own file again.
- That's easy, tell cdbsplit to extract all but the 4711 first:
-
- cdbsplit cookies visitors -a -n4712
-
- Now you can look at "visitors" to see what you've got, edit and reformat
- where needed, and then finally join the two databases for good.
-
-
- 3.6. Extract all poems :)
- -------------------------
-
- Would you agree that a poem is something that has at least four lines,
- but doesn't use the full line width? So let's try this:
-
- cdbsplit -l4 -W60 cookies poems
-
- You should check the contents of "poems" manually now, and maybe you will
- want to move some of the wider cookies back. Not a problem:
-
- cdbsplit poems cookies -w51 -a
-
-
- 4. Background information
- -------------------------
-
- Just like "onecookie", "cookietool" has to load the complete database into
- memory first. (Tough luck for those with a 1 Meg Amiga and a 1.2 MB
- database :-). But unlike "onecookie" does, the cookies aren't compared
- each against all others (O(n*n) operation) but sorted first and then
- compared against their neighbours only (O(n*log n) operation). For a
- database of 1000 cookies, that's about 100 times faster!
-
- Overwriting input files is done by creating a tempfile and renaming it
- when all else is done. So breaking (or crashing) the programs won't
- lead to data loss. Unless, of course, you use cookietool with the '-o'
- option, but I already warned you about that! (For those who absolutely
- need to know: Breaking cookietool while it is still reading data is
- safe, even with -o, because the output file won't be opened until after
- all deleting and sorting is done. But please, kids, don't try this at
- home! Or better still: Don't use -o at all.)
-
- Note that breaking "cdbsplit" while it is appending to another file is no
- good idea. All cookies that were already copied are then present in both
- files, and most likely the output file even ends with an incomplete
- cookie! The same can happen without your fault, if cdbsplit encounters a
- "Disk Full" error.
- In both cases, don't append any further data to this output file, or the
- first of the new cookies will be merged with that incomplete cookie, due
- to the missing %% separator! You might run "cookietool" once on the
- output file, that will ensure a valid file format again, and the
- incomplete cookie will be removed.
-
-
- 5. History
- ----------
-
- V1.0 -
- V1.3 forget them, they were all crap, too hard to use
-
- V2.0 no more reformatting of cookies, sorry for those who miss it :'(
-
- V2.1 fixed a bug that would unnecessarily lose data after "Disk Full"
- errors
-
- V2.2 added search for combinations of keywords and the -x option for
- CdbSplit, CookieTool can now sort by size
-
- V2.3 changed licensing to GPL, minor bugfix in CdbSplit (the -w option
- was off by one from its designed behaviour), cookie separator can be
- redefined as something other that "%%" at compile time (#define EOC),
- added short manpages for cookietool and cdbsplit
-
-
- 6. The author & license
- -----------------------
-
- Wilhelm Nöker <wnoeker@t-online.de>
- Hertastr. 8, D-44388 Dortmund
-
- CookieTool and CdbSplit are distributed under GNU General Public License
- version 2 or later.
-
- Drop me an e-mail if you like these programs, or if you want to suggest
- some more features.
-
-
- 7. Credits
- ----------
-
- CookieTool and CdbSplit were written using CygnusEd V4.2 and the GNU C
- compiler (with libnix).
-
- The man pages, the Unix makefile and the "GPL paperwork" for V2.3 were
- done by Miroslaw 'Jubal' Baran <baran@knm.org.pl>.
-
- Greetings to Christian Kemp (author of IntuiCookie and of the Amiga port
- of SmartAss, and, last not least: maintainer of the great Amiga Network
- News web pages at www.ann.lu).
-
-